In [3]:
%matplotlib inline
import sys
print(sys.version)
import numpy as np
print(np.__version__)
import pandas as pd
print(pd.__version__)
import matplotlib.pyplot as plt
The series is one of the foundations of pandas as we saw in the previous video. it’s got a lot of helpful add ons that bring more expressive power to the NumPy Array.
We’re still going to be using randomly generated data in this video. Personally I always get tired when we use a lot of made up data but I promise you, promise that we’re going to get to the good stuff very soon. It’s important to cover a lot of these bases before you get in over your head. I know it’s helped me a lot.
So let’s get started, you can see we’ve got our standard import. This is the import that I’ll be using from here on out, it gives you the python and pandas versions. Sets some default styling which I’ll get to when we cover plotting.
I’m going to create a random range of integers from 1,20 and get 26 of them.
In [4]:
np.random.seed(125)
raw_np_range = np.random.random_integers(1,20,26)
Now I’ll convert that into a panda series using pd.Series.from_array.
In [5]:
data = pd.Series.from_array(raw_np_range)
Now one thing to note is that we can actually just use pd.Series, which is more common and what I’ll be using from now on.
In [6]:
pd.Series(raw_np_range)
Out[6]:
Typically pandas will do it’s best to figure out the type of the data that you’re bringing in.
In [7]:
pd.Series(['hello',1,1.0])
Out[7]:
It will typically default to a float if you've got one in the list.
In [8]:
pd.Series([1.0,2,3,4,5])
Out[8]:
You can also instantiate the index with it as well. This makes it so we can look up those row based values using those identifiers.
In [9]:
pd.Series([1.0,2,3,4,5], index=['a','b','c','d','e'])
Out[9]:
We can also convert our original list into a float as well overriding the data type. We can do this with any of the numpy data types we choose to.
In [10]:
pd.Series(raw_np_range, dtype=np.float16)
Out[10]:
Now that we know how to get it into a Series we can start using some Series commands.
In [11]:
data
Out[11]:
First it can be helpful to get the shape of the Series, we can do this with len() or with .shape property.
In [12]:
data.shape
Out[12]:
In [13]:
len(data)
Out[13]:
Head and tail will print the first and last n numbers of the Series respectively. By default this is 5.
In [16]:
print(data.head())
print(data.tail())
However we can specify any number of items to print like 10.
In [17]:
data.head(10)
Out[17]:
Now since we’ve got a list of number we might want to take the mean median and mode. We can do that extremely easily with the mean, median, and mode commands.
In [18]:
data.mean()
Out[18]:
In [19]:
data.median()
Out[19]:
In [20]:
data.mode()
Out[20]:
We can also get the count of values, just like the shape command except this returns a dedicated integer.
In [21]:
data.count()
Out[21]:
We can also find out the unique values in an array by just using the unique method. This will give us all the unique values that we have in our series.
In [22]:
data.unique()
Out[22]:
Now if we wanted a Frequency Distribution It would be helpful to be able to see all the counts. We can do that with the value_counts command. We can see like we saw above that 1,3,10,13 are all tied for the mode value.
In [23]:
data.value_counts()
Out[23]:
We can get a lot of these values and get a good sense of the data with the “describe” method. This method allows you to get a lot of key statistics about the data and is one that you’ll likely use every time you start working with a data set.
In [24]:
data.describe()
Out[24]:
Now while we’re at it I think this would be an appropriate time to show you our first graphical representation. We just created a frequency distribution, or the number of counters per value. It would be helpful to see that graphically as well represented as a histogram.
We can make that extremely simply with the .hist() command.
In [25]:
data.hist()
Out[25]:
Now we’ve got our first graph! On that note we’ll end this video but I hope you are starting to see how expressive pandas is. In the next video we will cover querying data in pandas series through look ups, selections, and indexing.
In [22]: